Horovod

References

#PAPER Horovod: fast and easy distributed deep learning in TensorFlow (Sergeev 2018)
- #CODE https://github.com/horovod/horovod
- https://horovod.readthedocs.io/en/latest/keras.html
- https://horovod.readthedocs.io/en/stable/tensorflow.html
- https://eng.uber.com/horovod/
- Horovod is a distributed training framework for TensorFlow, Keras, PyTorch, and MXNet. The goal of Horovod is to make distributed Deep Learning fast and easy to use. Horovod is hosted by the LF AI Foundation (Linux Foundation AI). Horovod implements all-reduce operations into the back-propagation computation to average the computed gradients and allow the distributed scaling among multiple GPUs. Based on Baidu ring allreduce (http://andrew.gibiansky.com/blog/machine-learning/baidu-allreduce/)
- [Not straightforward from Jupyterlab](https://github.com/horovod/horovod/issues/622. Possible solution - Ipyparallel:)
  - Interactive Distributed Deep Learning with Jupyter Notebooks
  - https://github.com/sparticlesteve/cori-intml-examples

Examples